Goto

Collaborating Authors

 activation pattern




incorporate feedback into our final revision. 4 [R1]: " I don't exactly see if small batch vs large batch captures this phenomenon; if yes should say explicitly. "

Neural Information Processing Systems

We thank the reviewers for the detailed and insightful reviews. As the reviews noted, our work 1) introduces "novel Smith et al. [2017] make an explicit connection between small vs. large batch "A small discussion on if the phenomenon has been observed for different datasets/tasks with different optimizers" The phenomenon may not be true for other optimizers such as Adam, though. "concept of "memorizable and generalizable", though intuitive, is sketchy and not formally explained ... authors We acknowledge that the terms "memorizable" and "generalizable" are potentially confusing. We will revise our terminology to clarify this distinction. By "inherently noisy", we refer to the fact that high noise in the datapoints will necessitate larger sample complexity.



Deep ReLU Networks Have Surprisingly Few Activation Patterns

Neural Information Processing Systems

The success of deep networks has been attributed in part to their expressivity: per parameter, deep networks can approximate a richer class of functions than shallow networks. In ReLU networks, the number of activation patterns is one measure of expressivity; and the maximum number of patterns grows exponentially with the depth. However, recent work has showed that the practical expressivity of deep networks - the functions they can learn rather than express - is often far from the theoretical maximum. In this paper, we show that the average number of activation patterns for ReLU networks at initialization is bounded by the total number of neurons raised to the input dimension. We show empirically that this bound, which is independent of the depth, is tight both at initialization and during training, even on memorization tasks that should maximize the number of activation patterns. Our work suggests that realizing the full expressivity of deep networks may not be possible in practice, at least with current methods.


A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

Neural Information Processing Systems

The NP-hard problem of optimizing a shallow ReLU network can be characterized as a combinatorial search over each training example's activation pattern followed by a constrained convex problem given a fixed set of activation patterns. We explore the implications of this combinatorial aspect of ReLU optimization in this work. We show that it can be naturally modeled via a geometric and combinatoric object known as a zonotope with its vertex set isomorphic to the set of feasible activation patterns. This assists in analysis and provides a foundation for further research. We demonstrate its usefulness when we explore the sensitivity of the optimal loss to perturbations of the training data. Later we discuss methods of zonotope vertex selection and its relevance to optimization. Overparameterization assists in training by making a randomly chosen vertex more likely to contain a good solution. We then introduce a novel polynomial-time vertex selection procedure that provably picks a vertex containing the global optimum using only double the minimum number of parameters required to fit the data. We further introduce a local greedy search heuristic over zonotope vertices and demonstrate that it outperforms gradient descent on underparameterized problems.


Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

Neural Information Processing Systems

The convergence of GD and SGD when training mildly parameterized neural networks starting from random initialization is studied. For a broad range of models and loss functions, including the widely used square loss and cross entropy loss, we prove an ''early stage convergence'' result. We show that the loss is decreased by a significant amount in the early stage of the training, and this decreasing is fast. Furthurmore, for exponential type loss functions, and under some assumptions on the training data, we show global convergence of GD. Instead of relying on extreme over-parameterization, our study is based on a microscopic analysis of the activation patterns for the neurons, which helps us derive gradient lower bounds. The results on activation patterns, which we call ``neuron partition'', help build intuitions for understanding the behavior of neural networks' training dynamics, and may be of independent interest.


Neural expressiveness for beyond importance model compression

Maroudis, Angelos-Christos, Xydis, Sotirios

arXiv.org Artificial Intelligence

Neural Network Pruning has been established as driving force in the exploration of memory and energy efficient solutions with high throughput both during training and at test time. In this paper, we introduce a novel criterion for model compression, named "Expressiveness". Unlike existing pruning methods that rely on the inherent "Importance" of neurons' and filters' weights, ``Expressiveness" emphasizes a neuron's or group of neurons ability to redistribute informational resources effectively, based on the overlap of activations. This characteristic is strongly correlated to a network's initialization state, establishing criterion autonomy from the learning state stateless and thus setting a new fundamental basis for the expansion of compression strategies in regards to the "When to Prune" question. We show that expressiveness is effectively approximated with arbitrary data or limited dataset's representative samples, making ground for the exploration of Data-Agnostic strategies. Our work also facilitates a "hybrid" formulation of expressiveness and importance-based pruning strategies, illustrating their complementary benefits and delivering up to 10x extra gains w.r.t. weight-based approaches in parameter compression ratios, with an average of 1% in performance degradation. We also show that employing expressiveness (independently) for pruning leads to an improvement over top-performing and foundational methods in terms of compression efficiency. Finally, on YOLOv8, we achieve a 46.1% MACs reduction by removing 55.4\% of the parameters, with an increase of 3% in the mean Absolute Precision ($mAP_{50-95}$) for object detection on COCO dataset.


Training Introspective Behavior: Fine-Tuning Induces Reliable Internal State Detection in a 7B Model

Rivera, Joshua Fonseca

arXiv.org Artificial Intelligence

Lindsey (2025) investigates introspective awareness in language models through four experiments, finding that models can sometimes detect and identify injected activation patterns -- but unreliably (~20% success in the best model). We focus on the first of these experiments -- self-report of injected "thoughts" -- and ask whether this capability can be directly trained rather than waiting for emergence. Through fine-tuning on transient single-token injections, we transform a 7B parameter model from near-complete failure (0.4% accuracy, 6.7% false positive rate) to reliable detection (85% accuracy on held-out concepts at α=40, 0% false positives). Our model detects fleeting "thoughts" injected at a single token position, retains that information, and reports the semantic content across subsequent generation steps. On this task, our trained model satisfies three of Lindsey's criteria: accuracy (correct identification), grounding (0/60 false positives), and internality (detection precedes verbalization). Generalization to unseen concept vectors (7.5pp gap) demonstrates the model learns a transferable skill rather than memorizing specific vectors, though this does not establish metacognitive representation in Lindsey's sense. These results address an open question raised by Lindsey: whether "training for introspection would help eliminate cross-model differences." We show that at least one component of introspective behavior can be directly induced, offering a pathway to built-in AI transparency.


One SPACE to Rule Them All: Jointly Mitigating Factuality and Faithfulness Hallucinations in LLMs

Wang, Pengbo, Li, Chaozhuo, Wang, Chenxu, Zheng, Liwen, Zhang, Litian, Zhang, Xi

arXiv.org Artificial Intelligence

LLMs have demonstrated unprecedented capabilities in natural language processing, yet their practical deployment remains hindered by persistent factuality and faithfulness hallucinations. While existing methods address these hallucination types independently, they inadvertently induce performance trade-offs, as interventions targeting one type often exacerbate the other. Through empirical and theoretical analysis of activation space dynamics in LLMs, we reveal that these hallucination categories share overlapping subspaces within neural representations, presenting an opportunity for concurrent mitigation. To harness this insight, we propose SPACE, a unified framework that jointly enhances factuality and faithfulness by editing shared activation subspaces. SPACE establishes a geometric foundation for shared subspace existence through dual-task feature modeling, then identifies and edits these subspaces via a hybrid probe strategy combining spectral clustering and attention head saliency scoring. Experimental results across multiple benchmark datasets demonstrate the superiority of our approach.